Goto

Collaborating Authors

 texture feature



TexQ: Zero-shot Network Quantization with Texture Feature Distribution Calibration

Neural Information Processing Systems

Quantization is an effective way to compress neural networks. By reducing the bit width of the parameters, the processing efficiency of neural network models at edge devices can be notably improved. Most conventional quantization methods utilize real datasets to optimize quantization parameters and fine-tune. Due to the inevitable privacy and security issues of real samples, the existing real-data-driven methods are no longer applicable. Thus, a natural method is to introduce synthetic samples for zero-shot quantization (ZSQ).



SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation

Chen, Zisheng, Wang, Chunwei, Chen, Xiuwei, Xu, Hang, Han, Jianhua, Liang, Xiaodan

arXiv.org Artificial Intelligence

We present SemHiTok, a unified image Tokenizer via Semantic-Guided Hierarchical codebook that provides consistent discrete feature representations for multimodal understanding and generation tasks. Recently, unified multimodal large models (MLLMs) for understanding and generation have sparked exploration within research community. Previous works attempt to train a unified image tokenizer by combining loss functions for semantic feature reconstruction and pixel reconstruction. However, due to the differing levels of features prioritized by multimodal understanding and generation tasks, joint training methods face significant challenges in achieving a good trade-off. SemHiTok addresses this challenge through Semantic-Guided Hierarchical codebook which builds texture sub-codebooks on pre-trained semantic codebook. This design decouples the training of semantic reconstruction and pixel reconstruction and equips the tokenizer with low-level texture feature extraction capability without degradation of high-level semantic feature extraction ability. Our experiments demonstrate that SemHiTok achieves excellent rFID score at 256X256resolution compared to other unified tokenizers, and exhibits competitive performance on multimodal understanding and generation tasks.


Neural Edge Histogram Descriptors for Underwater Acoustic Target Recognition

Agashe, Atharva, Carreiro, Davelle, Van Dine, Alexandra, Peeples, Joshua

arXiv.org Artificial Intelligence

NDERWATER acoustic target recognition (UATR) is crucial for applications such as environmental monitoring, Deep learning models, such as convolutional neural networks exploration, and ship noise characterization, aiding in (CNNs), excel in feature representation and transfer learning, marine resource management and ocean-based technologies to adapting well to underwater acoustics when pre-trained on enhance ocean monitoring [1], [2]. Passive sonar uses external large vision datasets [11]-[13]. Similarly, pre-trained audio acoustic signals to identify underwater objects without emitting neural networks (PANNs) [14], trained on a large audio dataset sound [2]. Spectrograms, generated through signal processing (AudioSet [15]), have proven effective for passive sonar techniques like Short-Time Fourier Transform (STFT) classification where data scarcity is a challenge [16]. Moreover, and Mel-frequency spectrograms, transform signals into visual transformer-based models, including vision transformers representations, facilitating complex pattern extraction from (ViTs) [17] and audio spectrogram transformers (ASTs) [18], acoustic data [3]-[5].


Cardiomyopathy Diagnosis Model from Endomyocardial Biopsy Specimens: Appropriate Feature Space and Class Boundary in Small Sample Size Data

Mori, Masaya, Omae, Yuto, Koyama, Yutaka, Hara, Kazuyuki, Toyotani, Jun, Okumura, Yasuo, Hao, Hiroyuki

arXiv.org Artificial Intelligence

As the number of patients with heart failure increases, machine learning (ML) has garnered attention in cardiomyopathy diagnosis, driven by the shortage of pathologists. However, endomyocardial biopsy specimens are often small sample size and require techniques such as feature extraction and dimensionality reduction. This study aims to determine whether texture features are effective for feature extraction in the pathological diagnosis of cardiomyopathy. Furthermore, model designs that contribute toward improving generalization performance are examined by applying feature selection (FS) and dimensional compression (DC) to several ML models. The obtained results were verified by visualizing the inter-class distribution differences and conducting statistical hypothesis testing based on texture features. Additionally, they were evaluated using predictive performance across different model designs with varying combinations of FS and DC (applied or not) and decision boundaries. The obtained results confirmed that texture features may be effective for the pathological diagnosis of cardiomyopathy. Moreover, when the ratio of features to the sample size is high, a multi-step process involving FS and DC improved the generalization performance, with the linear kernel support vector machine achieving the best results. This process was demonstrated to be potentially effective for models with reduced complexity, regardless of whether the decision boundaries were linear, curved, perpendicular, or parallel to the axes. These findings are expected to facilitate the development of an effective cardiomyopathy diagnostic model for its rapid adoption in medical practice.


A Novel Approach to Malicious Code Detection Using CNN-BiLSTM and Feature Fusion

Zhang, Lixia, Liu, Tianxu, Shen, Kaihui, Chen, Cheng

arXiv.org Artificial Intelligence

With the rapid advancement of Internet technology, the threat of malware to computer systems and network security has intensified. Malware affects individual privacy and security and poses risks to critical infrastructures of enterprises and nations. The increasing quantity and complexity of malware, along with its concealment and diversity, challenge traditional detection techniques. Static detection methods struggle against variants and packed malware, while dynamic methods face high costs and risks that limit their application. Consequently, there is an urgent need for novel and efficient malware detection techniques to improve accuracy and robustness. This study first employs the minhash algorithm to convert binary files of malware into grayscale images, followed by the extraction of global and local texture features using GIST and LBP algorithms. Additionally, the study utilizes IDA Pro to decompile and extract opcode sequences, applying N-gram and tf-idf algorithms for feature vectorization. The fusion of these features enables the model to comprehensively capture the behavioral characteristics of malware. In terms of model construction, a CNN-BiLSTM fusion model is designed to simultaneously process image features and opcode sequences, enhancing classification performance. Experimental validation on multiple public datasets demonstrates that the proposed method significantly outperforms traditional detection techniques in terms of accuracy, recall, and F1 score, particularly in detecting variants and obfuscated malware with greater stability. The research presented in this paper offers new insights into the development of malware detection technologies, validating the effectiveness of feature and model fusion, and holds promising application prospects.


Electrooptical Image Synthesis from SAR Imagery Using Generative Adversarial Networks

Rosario, Grant, Noever, David

arXiv.org Artificial Intelligence

The utility of Synthetic Aperture Radar (SAR) imagery in remote sensing and satellite image analysis is well established, offering robustness under various weather and lighting conditions. However, SAR images, characterized by their unique structural and texture characteristics, often pose interpretability challenges for analysts accustomed to electrooptical (EO) imagery. This application compares state-of-the-art Generative Adversarial Networks (GANs) including Pix2Pix, CycleGan, S-CycleGan, and a novel dualgenerator GAN utilizing partial convolutions and a novel dual-generator architecture utilizing transformers. These models are designed to progressively refine the realism in the translated optical images, thereby enhancing the visual interpretability of SAR data. We demonstrate the efficacy of our approach through qualitative and quantitative evaluations, comparing the synthesized EO images with actual EO images in terms of visual fidelity and feature preservation. The results show significant improvements in interpretability, making SAR data more accessible for analysts familiar with EO imagery. Furthermore, we explore the potential of this technology in various applications, including environmental monitoring, urban planning, and military reconnaissance, where rapid, accurate interpretation of SAR data is crucial. Our research contributes to the field of remote sensing by bridging the gap between SAR and EO imagery, offering a novel tool for enhanced data interpretation and broader application of SAR technology in various domains. NTRODUCTION Synthetic Aperture Radar (SAR) systems are capable of creating high-resolution remote sensing images of the earths surface from satellite and aircraft. These images offer several key advantages over standard electro-optical (EO) images, most significantly, the ability to penetrate clouds and operate independently of daylight, which has led to SAR systems being deployed extensively in various fields, including environmental monitoring, natural disaster assessment, military reconnaissance, and geological mapping [1]. Figure 1 shows the benefit of a SAR image when cloud coverage is present. Despite these advantages, SAR images poses significant challenges and still has drawbacks compared to EO images, specifically regarding human interpretability.


Coral Model Generation from Single Images for Virtual Reality Applications

Fu, Jie, Fu, Shun, Grierson, Mick

arXiv.org Artificial Intelligence

With the rapid development of VR technology, the demand for high-quality 3D models is increasing. Traditional methods struggle with efficiency and quality in large-scale customization. This paper introduces a deep-learning framework that generates high-precision 3D coral models from a single image. Using the Coral dataset, the framework extracts geometric and texture features, performs 3D reconstruction, and optimizes design and material blending. Advanced optimization and polygon count control ensure shape accuracy, detail retention, and flexible output for various complexities, catering to high-quality rendering and real-time interaction needs.The project incorporates Explainable AI (XAI) to transform AI-generated models into interactive "artworks," best viewed in VR and XR. This enhances model interpretability and human-machine collaboration. Real-time feedback in VR interactions displays information like coral species and habitat, enriching user experience. The generated models surpass traditional methods in detail, visual quality, and efficiency. This research offers an intelligent approach to 3D content creation for VR, lowering production barriers, and promoting widespread VR applications. Additionally, integrating XAI provides new insights into AI-generated visual content and advances research in 3D vision interpretability.


Multi-scale HSV Color Feature Embedding for High-fidelity NIR-to-RGB Spectrum Translation

Zhai, Huiyu, Chen, Mo, Yang, Xingxing, Kang, Gusheng

arXiv.org Artificial Intelligence

The NIR-to-RGB spectral domain translation is a formidable task due to the inherent spectral mapping ambiguities within NIR inputs and RGB outputs. Thus, existing methods fail to reconcile the tension between maintaining texture detail fidelity and achieving diverse color variations. In this paper, we propose a Multi-scale HSV Color Feature Embedding Network (MCFNet) that decomposes the mapping process into three sub-tasks, including NIR texture maintenance, coarse geometry reconstruction, and RGB color prediction. Thus, we propose three key modules for each corresponding sub-task: the Texture Preserving Block (TPB), the HSV Color Feature Embedding Module (HSV-CFEM), and the Geometry Reconstruction Module (GRM). These modules contribute to our MCFNet methodically tackling spectral translation through a series of escalating resolutions, progressively enriching images with color and texture fidelity in a scale-coherent fashion. The proposed MCFNet demonstrates substantial performance gains over the NIR image colorization task. Code is released at: https://github.com/AlexYangxx/MCFNet.